User-Defined Parameters

Parameter Value Variable
Series Code 21864 ts
Maximum Lag 48 af.lags
Prevision Horizon 12 n.ahead
Unit Root Test ADF, drift, 6, BIC, 5pct ur.test
ARCH Test 12, FALSE, 0.05 arch.test
Box Test 1, Ljung-Box, 0 box.test
Dummy FALSE dummy

Getting the Time Series from the BETS database

library(BETS)
data = BETS.get(code)

Information About the Series

info <- BETS.search(code = ts, view = F)
print(params$teste)
## NULL
Code Description Periodicity Start Source Unit NA
21864 Physical Production - Intermediate goods Index 2002.1 01/01/2002 mar/2017 IBGE

Graph

## 
##     'mFilter' version: 0.1-3
## 
##     'mFilter' is a package for time series filtering
## 
##     See 'library(help="mFilter")' for details
## 
##     Author: Mehmet Balcilar, mbalcilar@yahoo.com
## library(mFilter)
trend = fitted(hpfilter(data))

library(dygraphs)
dygraph(cbind(Series = data, Trend = trend), main = info[,"Description"]) %>%
  dyRangeSelector(strokeColor = "gray", fillColor = "gray") %>%
    dyAxis("y", label = info[,"Unit"])
Physical Production - Intermediate goods
mar/2017
Series
Trend
75
80
85
90
95
100
105
110
115
2010

Unit Root Tests

Augmented Dickey-Fuller

  test.params = append(list(y = data), ur.test)
  df = do.call(BETS.ur_test,test.params)
  df$results
##      statistic crit.val rej.H0
## tau2 -4.118928    -2.88    yes
## phi1  8.488818     4.63     no

For a 95% confidence interval, the test statistic tau3 is smaller than the critical value. We therefore conclude that there is no non-seasonal unit root.

ns_roots = 0
d_ts = data 

Osborn-Chui-Smith-Birchenhall

This test will be performed for lag 12, that is, the frequency of the series 21864.

library(forecast)
s_roots = nsdiffs(data)
print(s_roots)
## [1] 0

According to the OCSB test, there is no seasonal unit root, at least at a 5% significance level.

Auto-Correlation Functions

ACF and PACF - Original Series

BETS.corrgram(d_ts, lag.max = af.lags, mode = "bartlett", knit = T)
112243648-1.0-0.50.00.51.0
LagCorrelation
BETS.corrgram(d_ts, lag.max = af.lags, mode = "simple", type = "partial", knit = T)
112243648-0.50.00.5
LagPartial Correlation

Model Identification and Estimation

The correlograms from last section gives us enough information to try to identify the underlying SARIMA model parameters. We can confirm our guess by running the auto.arima function from the package forecast. By default, this function uses the AICc (Akaike Information Criterion with Finite Sample Correction) for model selection. Here, we are going to use BIC (Bayesian Information Criterion), in which the penalty term for the number of parameters in the model is larger than in AIC.

model <- auto.arima(data, ic = tolower(inf.crit), test = tolower(ur.test$mode), 
                   max.d = ns_roots, max.D = s_roots)
summary(model)
## Series: data 
## ARIMA(3,0,1)(1,0,0)[12] with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3      ma1    sar1     mean
##       1.3585  -0.1123  -0.3139  -0.6598  0.8484  89.5756
## s.e.  0.1543   0.1741   0.0710   0.1557  0.0368   5.1006
## 
## sigma^2 estimated as 7.783:  log likelihood=-469.63
## AIC=953.25   AICc=953.87   BIC=975.98
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.1335656 2.745405 2.078356 0.06256712 2.227039 0.4780917
##                     ACF1
## Training set -0.02283119

We see that, according to BIC, the best model is a SARIMA(3,0,1)(1,0,0)[12]. Nevertheless, this is not the end. We still have to test for heteroskedasticity in the residuals. We can use an ARCH test with this purpose.

arch.params <- append(list(x = resid(model)), arch.test)
at <- do.call(BETS.arch_test, arch.params)
at
##   statistic    p.value  htk
## 1  20.00048 0.06707697 TRUE

The p.value of 0.07 is larger than the significance level of 0.05. We therefore conclude that the residuals are heteroskedastic. This means we should have built the SARIMA model for the log of the original series.

The next step, then, is to rebuild the model using log(data).

Unit root tests (non-seasonal)

data <- log(data)

ns_roots = 0
d_ts = data
test.params = append(list(y = d_ts), ur.test)
df = do.call(BETS.ur_test,test.params)
while(df$results[1,"statistic"] > df$results[1,"crit.val"]){
    ns_roots = ns_roots + 1
    d_ts = diff(d_ts)
    test.params = append(list(y = d_ts), ur.test)
    df = do.call(BETS.ur_test,test.params)
}

ns_roots
## [1] 0

Seasonal unit root test

s_roots = nsdiffs(data)
print(s_roots)
## [1] 0
if(s_roots != 0) {
   d_ts <- diff(d_ts, lag = frequency(data), differences = s_roots) 
}

s_roots
## [1] 0

ACF and PACF

BETS.corrgram(d_ts, lag.max = af.lags, mode = "bartlett", knit = T)
112243648-1.0-0.50.00.51.0
LagCorrelation
BETS.corrgram(d_ts, lag.max = af.lags, mode = "simple", type = "partial", knit = T)
112243648-0.50.00.5
LagPartial Correlation

Estimation

model = auto.arima(data, ic = tolower(inf.crit), test = tolower(ur.test$mode), 
                   max.d = ns_roots, max.D = s_roots)
summary(model)
## Series: data 
## ARIMA(1,0,0)(1,0,0)[12] with non-zero mean 
## 
## Coefficients:
##          ar1    sar1    mean
##       0.8559  0.8404  4.4873
## s.e.  0.0368  0.0362  0.0759
## 
## sigma^2 estimated as 0.001003:  log likelihood=379.68
## AIC=-751.36   AICc=-751.15   BIC=-738.37
## 
## Training set error measures:
##                       ME       RMSE        MAE        MPE      MAPE
## Training set 0.001214914 0.03142528 0.02416248 0.02253483 0.5333905
##                  MASE       ACF1
## Training set 0.519988 -0.1226789

The best model seems to be a SARIMA(1,0,0)(1,0,0)[12]

ARCH Test

arch.params <- append(list(x = resid(model)), arch.test)
at <- do.call(BETS.arch_test, arch.params)
at
##   statistic     p.value   htk
## 1  28.02658 0.005482736 FALSE

We are led to conclude that the residuals of the second model are not heteroskedastic, which means this model is a better choice compared to the first.

The next function outputs the model’s standardized residuals. If they are all inside the confidence interval, it means the behaviour of the series was well captured by the model. If few residuals are outside the confidence interval, using a dummy to handle structural breaks should be considered. But if most residuals are outside the interval, then SARIMA might not be the appropriate choice.

rsd <- BETS.std_resid(model, alpha = 0.01)

As a final step in model evaluation, we are going to apply the Ljung-Box test to check for autocorrelation in the residuals.

test.params <- append(list(x = resid(model)), box.test)
bt = do.call(Box.test,test.params)
bt$p.value
## [1] 0.08831017

The p-value is greater than 0.05, which means there is not enough statistical evidence to reject the null hypothesis of no autocorrelation in the residuals. This is a desirable result.

Forecasts

BETS.predict(model,h=n.ahead, main = info[,"Description"], ylab = info[,"Unit"], knit = T)
Physical Production - Intermediate goods
mar/2017
Actual
Predicted
4.2
4.3
4.4
4.5
4.6
4.7
2010